fix meta_eval after refactor and add new meta_mmlu_instruct task for 3.2 #862

wukaixingxp · 2025-01-21T22:24:23Z

What does this PR do?

This PR fix meta_eval after refactor by setting the correct path and update MATH dataset URL. Split 3.2 MMLU task into meta_mmlu_pretrain and meta_mmlu_instruct, tested below:

Feature/Issue validation/testing

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

add new meta_mmlu_instruct for 3.2

vllm (pretrained=meta-llama/Llama-3.2-3B-Instruct,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.4,data_parallel_size=1,max_model_len=8192,add_bos_token=True,seed=42), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto
|        Tasks        |Version|   Filter   |n-shot|  Metric   |   |Value |   |Stderr|
|---------------------|-------|------------|-----:|-----------|---|-----:|---|-----:|
|meta_instruct        |    N/A|            |      |           |   |      |   |      |
| - meta_gpqa         |      1|strict-match|     0|exact_match|↑  |0.3326|±  |0.0223|
| - meta_math         |      1|none        |     0|exact_match|↑  |0.4514|±  |0.0070|
| - meta_mmlu_instruct|      1|strict-match|     0|exact_match|↑  |0.6368|±  |0.0041|

test on 3b meta_mmlu_pretrain

2025-01-28:14:28:57,156 INFO     [evaluation_tracker.py:287] Saving per-sample results for: meta_mmlu_pretrain
vllm (pretrained=meta-llama/Llama-3.2-3B,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.4,data_parallel_size=1,max_model_len=8192,add_bos_token=True,seed=42), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto
|        Tasks        |Version|Filter|n-shot| Metric |   |Value|   |Stderr|
|---------------------|-------|------|-----:|--------|---|----:|---|-----:|
|meta_pretrain        |    N/A|      |      |        |   |     |   |      |
| - meta_mmlu_pretrain|      1|none  |     0|acc     |↑  |0.566|±  |0.0042|
|                     |       |none  |     0|acc_norm|↑  |0.566|±  |0.0042|

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Thanks for contributing 🎉!

wukaixingxp · 2025-01-28T21:25:40Z

add mmlu_instruct for 3.2

vllm (pretrained=meta-llama/Llama-3.2-3B-Instruct,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.4,data_parallel_size=1,max_model_len=8192,add_bos_token=True,seed=42), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto
|      Tasks       |Version|   Filter   |n-shot|  Metric   |   |Value |   |Stderr|
|------------------|------:|------------|-----:|-----------|---|-----:|---|-----:|
|meta_mmlu_instruct|      1|strict-match|     0|exact_match|↑  |0.6368|±  |0.0041|

fix meta_eval

ef939b2

facebook-github-bot added the cla signed label Jan 21, 2025

add mmlu_instruct for 3.2

00691af

wukaixingxp marked this pull request as ready for review January 28, 2025 22:33

wukaixingxp requested a review from init27 January 28, 2025 22:33

wukaixingxp changed the title ~~fix meta_eval after refactor~~ fix meta_eval after refactor and add new meta_mmlu_instruct task for 3.2 Jan 28, 2025

init27 approved these changes Jan 28, 2025

View reviewed changes

init27 merged commit 6bfd034 into main Jan 28, 2025
4 checks passed

wukaixingxp self-assigned this Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix meta_eval after refactor and add new meta_mmlu_instruct task for 3.2 #862

fix meta_eval after refactor and add new meta_mmlu_instruct task for 3.2 #862

wukaixingxp commented Jan 21, 2025 •

edited

Loading

wukaixingxp commented Jan 28, 2025

fix meta_eval after refactor and add new meta_mmlu_instruct task for 3.2 #862

fix meta_eval after refactor and add new meta_mmlu_instruct task for 3.2 #862

Conversation

wukaixingxp commented Jan 21, 2025 • edited Loading

What does this PR do?

Feature/Issue validation/testing

Before submitting

wukaixingxp commented Jan 28, 2025

wukaixingxp commented Jan 21, 2025 •

edited

Loading